Skip to content

A random-random test for time-series data #132556

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

pabloem
Copy link
Contributor

@pabloem pabloem commented Aug 7, 2025

Follow up items after this PR:

  • Test rate function and counters in general
  • Randomize window size
  • Increase test size
  • Hit some more corner cases (e.g. zero out some parameters)

@pabloem pabloem marked this pull request as ready for review August 11, 2025 18:06
@pabloem pabloem changed the title [wip][do not review] A random-random test for time-series data A random-random test for time-series data Aug 11, 2025
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Aug 11, 2025
@pabloem pabloem added >test Issues or PRs that are addressing/adding tests :StorageEngine/TSDB You know, for Metrics :StorageEngine/ES|QL Timeseries / metrics / logsdb capabilities in ES|QL and removed needs:triage Requires assignment of a team area label labels Aug 11, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Member

@not-napoleon not-napoleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to get some confirmation from @kkrik-es that this is doing what he wants, but I think it's pretty good. I left some feedback, none of which is critical but I'd like to get it addressed.

@@ -78,6 +79,7 @@ public FieldDataGenerator generator(String fieldName, DataSource dataSource) {
case IP -> new IpFieldDataGenerator(dataSource);
case CONSTANT_KEYWORD -> new ConstantKeywordFieldDataGenerator();
case WILDCARD -> new WildcardFieldDataGenerator(dataSource);
default -> null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not to have a default in this switch. When someone adds a new FieldType here, I want the compiler to tell them they must also update this switch, but the default will hide that.

In fact, this probably shouldn't be a switch. It should probably be an abstract method or a function member on the enum itself. Switches on enums are a bit of a smell, and switches on enums from within the enum itself are so smelly as to almost be an error, IMHO. I realize you didn't add this switch, but this is a good opportunity to refactor it and leave it better than we found it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can compromise on removing the default? 😅

private List<XContentBuilder> documents = null;
private DataGenerationHelper dataGenerationHelper;

private static final class DataGenerationHelper {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be a top level class. Seems like we'll want to build multiple test classes using this framework.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved this class to its own file! TY.

private static Object randomDimensionValue(String dimensionName) {
// We use dimensionName to determine the type of the value.
var isNumeric = dimensionName.hashCode() % 5 == 0;
if (isNumeric) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about IP dimensions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added 20% of dimensions as IP-like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as follow up ill add dynamic mapping to parse as ip. thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@pabloem pabloem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TY @not-napoleon - ptal!

@@ -78,6 +79,7 @@ public FieldDataGenerator generator(String fieldName, DataSource dataSource) {
case IP -> new IpFieldDataGenerator(dataSource);
case CONSTANT_KEYWORD -> new ConstantKeywordFieldDataGenerator();
case WILDCARD -> new WildcardFieldDataGenerator(dataSource);
default -> null;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can compromise on removing the default? 😅

private List<XContentBuilder> documents = null;
private DataGenerationHelper dataGenerationHelper;

private static final class DataGenerationHelper {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved this class to its own file! TY.

private static Object randomDimensionValue(String dimensionName) {
// We use dimensionName to determine the type of the value.
var isNumeric = dimensionName.hashCode() % 5 == 0;
if (isNumeric) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added 20% of dimensions as IP-like.

private static Object randomDimensionValue(String dimensionName) {
// We use dimensionName to determine the type of the value.
var isNumeric = dimensionName.hashCode() % 5 == 0;
if (isNumeric) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as follow up ill add dynamic mapping to parse as ip. thoughts?

var docValues = windowDataPoints.stream()
.map(doc -> ((Map<String, Integer>) doc.get("metrics")).get("gauge_hdd.bytes.used"))
.toList();
// Verify that the first column is the max value (the query gets max, avg, min in that order)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a follow-up, you can avoid hard-coding by defining an enum for each fuction, with corresponding validation logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

// Verify that the second column is the avg value (thus why row.get(2))
docValues.stream().mapToDouble(Integer::doubleValue).average().ifPresentOrElse(avgValue -> {
var res = (Double) row.get(2);
assertThat(res, closeTo(avgValue, res * 0.5));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the 0.5 factor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was a mistake (I meant to do 5% not 50%).
However, average calculation does seem to have up to 20-25% error between ES and test-framework numbers. Should I check if that's a bug and how to deal with it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be no error here (Double.compare() should return 0), so there's a bug somewhere. Let's investigate separately.


private static Object randomDimensionValue(String dimensionName) {
// We use dimensionName to determine the type of the value.
var isNumeric = dimensionName.hashCode() % 5 == 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: let's just use randomDouble() < 0.2. Same below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, this needs to be consistent per field. Why not rely on the data generation framework to provide these values? You can start simple with all dimensions being keyword fields - not dynamic, no pass-through subfields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to closely match an opentelemetry use case, so a bunch of dynamic dimensions under attributes... Do you think that's problematic? I can also add a non-dynamic test case as follow up. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not problematic but complicates things somewhat.. We need it eventually, I was thinking whether we should start with something simpler. Up to you, this works as well.

Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Pablo, this ia a good step. It's nice that you tried to include the pass-through field on the first take, though that complicates things somewhat. I'd start with statically defined dimension and metric fields to get the validation logic in place first, then introduce dynamic fields on top of that.

Let's try to refactor the logic slightly so that it can be further extended in follow-up PRs.

return List.of(
DataStreamsPlugin.class,
LocalStateCompositeXPackPlugin.class,
// Downsample.class, // TODO(pabloem): What are these
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove?

Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for addressing the comments. Let's keep iterating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:StorageEngine/ES|QL Timeseries / metrics / logsdb capabilities in ES|QL :StorageEngine/TSDB You know, for Metrics Team:StorageEngine >test Issues or PRs that are addressing/adding tests v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants